Scaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking

نویسندگان

  • George Papadakis
  • George Papastefanatos
  • Themis Palpanas
  • Manolis Koubarakis
چکیده

Entity Resolution constitutes a quadratic task that typically scales to large entity collections through blocking. The resulting blocks can be restructured by Meta-blocking in order to significantly increase precision at a limited cost in recall. Yet, its processing can be time-consuming, while its precision remains poor for configurations with high recall. In this work, we propose new meta-blocking methods that improve precision by up to an order of magnitude at a negligible cost to recall. We also introduce two efficiency techniques that, when combined, reduce the overhead time of Metablocking by more than an order of magnitude. We evaluate our approaches through an extensive experimental study over 6 realworld, heterogeneous datasets. The outcomes indicate that our new algorithms outperform all meta-blocking techniques as well as the state-of-the-art methods for block processing in all respects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel meta-blocking for scaling entity resolution over big heterogeneous data

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. In order to enable entity resolution to scale to large volumes of data, blocking is typically employed: it clusters similar entities into (overlapping) blocks so that it suffices to perform comparisons only within each block. To further increase efficiency, Meta-blocking is being used...

متن کامل

BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution

Identifying records that refer to the same entity is a fundamental step for data integration. Since it is prohibitively expensive to compare every pair of records, blocking techniques are typically employed to reduce the complexity of this task. These techniques partition records into blocks and limit the comparison to records co-occurring in a block. Generally, to deal with highly heterogeneou...

متن کامل

Boosting the Efficiency of Large-Scale Entity Resolution with Enhanced Meta-Blocking

Entity Resolution constitutes a quadratic task that typically scales to large entity collections through blocking. The resulting blocks can be restructured by Meta-blocking to raise precision at a limited cost in recall. At the core of this procedure lies the blocking graph, where the nodes correspond to entities and the edges connect the comparable pairs. There are several configurations for M...

متن کامل

Supervised Meta-blocking

Entity Resolution matches mentions of the same entity. Being an expensive task for large data, its performance can be improved by blocking, i.e., grouping similar entities and comparing only entities in the same group. Blocking improves the run-time of Entity Resolution, but it still involves unnecessary comparisons that limit its performance. Meta-blocking is the process of restructuring a blo...

متن کامل

A new 2D block ordering system for wavelet-based multi-resolution up-scaling

A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016